Seminar Presentation 


Expression Templates 
vs. 

Smart Expression Templates 


by Christian Fischer 


Overview 


• General overview of ET 

• ET and Op-Overloading explained 

• Performance of ET and Op-Overload in comparison 

• Limits and Problems, which comes with ET's 

• Introduction of SET to handle them 

• Two examples of SET-Solutions of ET-Problems 

• Comparison and Benchmark of SET-using libraries 
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What are ET's? 


Introduced by Veldhuizen in 1995 

Optimization for array-based operations 

Intents to avoid heavily and unnecessary use of temporary 
variables 

Able to reach similar performance as C-like-ops: 
for(int i = 0; i < a.size(); i+- 1-) { 
d[i] = a[i] + b[i] + c[i]; 

} 
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The “Classic" Way 


• Addition realised by 
operator-overloading 

• Needs a temporary 
variable for addition 

* De-/allocation and copy- 
operation needed 

* Slow on large vectors 

* Used memory blocked 
for other operations 

• Improvement needed! 


1 templateCtypename T> 

2 class Vector { 

3 public: 

4 // Constructor, Destructor, other basics... 

5 

6 Vector& operator= (const Vector& rhs) { 

7 if(&rhs == this) return *this; 

8 std: : copy (rhs. _value, rhs._value + size(), this->_value) ; 

9 return *this ; 

10 > 

11 int size() const { return this->_size; } 

12 T& operator [] (int i) { return _value [i] ; > 

13 const T& operator [] (int i) const { return _value [i] ; > 

14 

15 private: 

16 int _size; T* _value; 

17 >; 

18 

19 template<typename T> 

20 const Vector<T> operator+( const Vector<T>& left, 

21 const Vector<T>& right) { 

22 Vector<T> tmp (left . size ()) ; 

23 for (int i = 0; i < right . size () ; i++) 

24 tmp [i] = left [i] + right [i] ; 

25 return tmp ; 

26 > 
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Solution: Expression-templates (ET) 


Don't execute expression 
until assigned to target 
— >• prevents expensive 
memory-operations 

create a “placeholder” for 
the expression — » syntaxtree 

temporaries just references 

only one for-loop, even 
with lot additions in row 
— y classic: #adds = #loops 


tempfat^ftypenaLe T> 

,3 ,public: r 
cplass Vector t 

4 , . explicit Sum(const A& 1, const B& r) : 

public: r 1 f 1 ■ h 

// Constructor, D&stiracQoT , "ot'^r' ^ asics . . . 

6 int size() const { return this->_left . size() ; > 

\lt*rT.nirxr nnprfft.nr= ( rnnRt x/prt.nrxr rhs l A 

.right [i] ; > 


1 templateCtypename A, typename B> 


double operator [] (int i) const. r 
Vectors operator= (const Vectors rhs; { 

{ return this->_lef t [i] + this->_r: 
— -•-Ms) return *this; 


retur 
s == 1 

std:.: copy(rhs . _value, rhs._value + size(), this->_value) 
. return *this: , „ 

11 const A& _left; const B& _nght; 

12 } >; 

12 int size() const { return this->_size; } 

, { t r yfe^ n m^B lue [i] ; } 

15 ‘Wa™ ^^on^t WY, bT^ 6[i] : } 

16 return Sum<A, B>(a, b) ; 
p^i^ate : 

int _size; T* _value; 

-^19 class Vector { 

20 

tfPPlate^g^n^^ent.Qperator 

c§ ? st Ve^^iptd^R^^-feC'AShst Vector<T>& left, 

23 Vector& operator ri S ht ) i 

2 ^ector<T^o1?mG^U-ai^()L) * expr.sizeO; i++) 

2 ^or (int i [ig+) 

26 tmp [i] rettefit fcthife ;right [i] ; 

2?eturn }tmp ; 

>28 

^29 }; 
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Example: d = a + b + c 


Classic 


Vector<double> 

^ v x 

for-loop 


Vector<double> 

< > 

for-loop 


Vector<double> 

= 

Vector<double> 

+ 

Vector<double> 

+ 

Vector<double> 


I 


a r 

Sum<Vector<double> , Vector<double> > 


ET 


A f 

Sum< Sum<Vector<double>, Vector<double> >, Vector<double> > 
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“Classic” vs. ET - Memory 


Adding 5 vectors with 33-million entries each (7-times in total) 



Speicher- und Auslagerungschronik 

Expression-template 








^ 
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1,0 GiB (26,9%) von 3,8 GiB 546,1 MiB (13,6%) von 3,9 GiB 
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“Classic” vs. ET - Runtime 


f=a+b+c+d+e 


8n 



0-1 t 1 1 1 1 

0 20000 40000 60000 80000 100000 

Vector-size 


c = a + b 


■ ET ■ Classic 



Runtime (1 = C-like Addition) 


5 


f=a+b+c+d+e 
■ ET ■ Classic 



0 1 2 3 4 5 

Runtime (1 = C-like Addition) 
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Limits of ET's - Examples 


ET's has big advantages 
for simple expressions 

More complex expr. may 
lead to disadvantage 

Necessary temporaries 


Expr. evaluated at assignment 

Addition repeated for every 
row of the matrix 

Avoidable by using temp, var 


Vec<double> 

-f- 


= 


Mat<double> * ( Vec<double> 

+ Vec<double> 


A 

r 


Sum<Vec<double>, Vec<double> > 


Mul<Mat<double>, Sum<Vec<double> , Vec<double> > > 
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Limits of ET 

's - Examples 

• ET's has big advantages 

• Standard-evaluation leads 

for simple expressions 

to temp, variable 

• More complex expr. may 

• Avoidable by using variable, 

lead to disadvantage 

which is used anyways 

Evaluation strategies 


A = B 

4- C * D 

Left- to- Right- Eva 1 : 

Better strategy: 

1 | Temp = C * D 

1 | A = B 

2 A = B + Temp 

2 | A += C * D 
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Limits of ET's - Examples 


• ET's has big advantages * Standard-evaluation leads 

for simple expressions to Mat. -Mat. -Multiplication 

. »/■ . • Avoidable by restructuring 

• More complex expr. may 7 ° 

the expression 

lead to disadvantage 

Expression-restructu ri ng 

Mat * Mat * vec 

Left-to-Right-Eval: Restructured: 

(Mat * Mat) * vec Mat * (Mat * vec) 

— » Mat * vec — >• Mat * vec 

0(n 3 ) + 0(n 2 ) = 0(n 3 ) 0(n 2 ) + 0(n 2 ) = 0(n 2 ) 
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Limits of ET's - Examples 


• ET's has big advantages • ET: evaluation at 

for simple expressions assignment — » no temp 

• More complex expr. may • Manipulates values, 

lead to disadvantage which are still required 

Aliasing 


x = A * x 






Solution: Smart-Expression Templates (SET) 


Idea behind SET : 
distinguish between expr. + 
plain obj. in operations 
ResuItType: 

- Type after evaluation 
CompositeType: 

- Type if used in another 
expression 

SET take advantage of this 
information e.g. to create 
temporary variables 


template ctypename MT, typename VT> 

class MatVecMultExpr : public Vector<MatVecMultExpr<MT,VT», 
private Expression { 

// Result type for expression template evaluations, 
typedef typename MathTrait<MRT, VRT>: :MultType ResuItType; 

// Data type for composite expression templates 
typedef const Mat VecMult Expr & CompositeType ; 

// Resulting element type 

typedef typename ResuItType :: ElementType ElementType; 

// Member data type of the left -hand side dense matrix expression 
typedef typename MT :: CompositeType Lhs; 

// Member data type of the right -hand side dense vector expression 
typedef typename SelectType<IsExpression<VT> :: value, 

const VRT , const VT&> : : Type Rhs ; 
inline const ElementType operator []( size_t index ) const; 
inline size_t size() const; 

// . . . 

private: 

// Result type of the left -hand side dense matrix expression 
typedef typename MT: : ResuItType MRT ; 

// Result type of the right -hand side dense vector expression 
typedef typename VT : : ResuItType VRT ; 

// Element type of the left -hand side dense matrix expression 
typedef typename MRT : : ElementType MET ; 

// Element type of the right -hand side dense vector expression 
typedef typename VRT : : ElementType VET ; 

II ... 

Lhs mat_; // Left -hand side dense matrix of the mult. -expr. 

Rhs vec_; // Right -hand side dense vector of the mult. -expr. 

II ... 
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Example: Detecting Aliasing 


Potentially dangerous 
to allow incorrect calcs 

implement isAliased(...) 
in every vec, mat, expr. 

assignment op. checks 
if aliasing is present 

swaps the rhs with a 
temporary variable 


1 template Ctypename MT, typename VT> 

2 class MatVecMultExpr : public Vector<MatVecMultExpr<MT , VT>> , 

3 private Expression { 

4 public : 

5 II ... 

6 template <typename T> 

7 inline bool isAliased( const T* alias) const { 

8 return vec_ . isAliased(alias) ; 

9 > 

10 II ... 

11 private: 

12 H ... 

13 Lhs mat_; 

14 Rhs vec_; 

15 // . . . 

16 }; 

1 V in Vector-class 

2 template Ctypename T> 

3 template Ctypename 0> 

4 inline bool VectorCT> : : isAliased(const 0* alias) const { 

5 return static_castCconst void*>(this) == 

6 static_castCconst void*> (alias) ; 

7 } 

8 II ... 

9 // in the overloaded assignment-operator of Vector-class 

10 if (rhs . isAliased(this) ) { 

11 Vector tmp(rhs) ; 

12 swap(tmp) ; 

13 > 
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Tools using SET's 


Overview, which library supports handling of the shown problems wit ET's 



Eigen3 

Boost 

uBLAS 

Blaze 

Blitz++ 

Necessary 

temps 

+++ 

— 

— 

+++ 

— 

Evaluation 

strategies 

+++ 

— 

— 

+++ 

— 

Expression 

restructuring 

+++ 

— 

— 

+++ 

— 

Aliasing 

o 

o 

o 

+++ 

— 


+++: Supported, — : Not supported, O: just with right handling 
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Benchmarks 


• A * (a + b) 

• Graph with results 


• (A * B) * (a + b) 
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ET's and 


SET-Tools 


• A * (a + b + c) 


• (A * B) + C 
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Conclusion 


• ET are able to improve operations between 
mathematical objects by avoiding temporaries in 
contrast to classic operator-overloading 

• Avoiding these temps leads to new problems, which 
can be solved by SET 

• Not every library which uses ET/SET handles the 
problems, which comes with them 
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Question-and-Answer 


Any questions, comments, etc.? 


Thanks for your attention! 
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